RESUMO
Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can be developed and run on a consumer grade desktop computer with standard specifications, and can also be applied to extracting label data from diverse kinds of natural history specimens, such as those in entomological collections. This system can facilitate the digitization and publication of natural history museum specimens around the world.
Assuntos
Documentação , Museus , Bases de Dados Factuais , EntomologiaRESUMO
An annotated English translation of a German early 19th century text including Latin diagnoses is presented with a high-quality scan of the original publication and direct links to the cited pages with taxon and literature citations (including TL-2 entries).